Some clean up and exploration:

library(knitr)
library(tidyverse)

cocktails <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-26/cocktails.csv')

# Converting every character to lower case:
cocktails <- cocktails %>% 
  mutate(alcoholic = str_to_lower(alcoholic),
         glass = str_to_lower(glass),
         ingredient = str_to_lower(ingredient))

cocktails %>% 
  filter(ingredient_number == 1) %>% 
  count(alcoholic, category, glass, sort = TRUE)
## # A tibble: 92 x 4
##    alcoholic     category         glass                   n
##    <chr>         <chr>            <chr>               <int>
##  1 alcoholic     Ordinary Drink   cocktail glass         90
##  2 alcoholic     Ordinary Drink   highball glass         51
##  3 alcoholic     Ordinary Drink   collins glass          50
##  4 alcoholic     Ordinary Drink   old-fashioned glass    43
##  5 alcoholic     Shot             shot glass             37
##  6 alcoholic     Cocktail         cocktail glass         29
##  7 non alcoholic Other/Unknown    highball glass         20
##  8 alcoholic     Ordinary Drink   whiskey sour glass     16
##  9 alcoholic     Homemade Liqueur collins glass          10
## 10 alcoholic     Ordinary Drink   champagne flute        10
## # ... with 82 more rows
cocktails %>% 
  count(alcoholic, ingredient, sort = TRUE)
## # A tibble: 403 x 3
##    alcoholic ingredient       n
##    <chr>     <chr>        <int>
##  1 alcoholic vodka           87
##  2 alcoholic gin             84
##  3 alcoholic orange juice    51
##  4 alcoholic lemon juice     50
##  5 alcoholic sugar           49
##  6 alcoholic lemon           44
##  7 alcoholic light rum       42
##  8 alcoholic amaretto        39
##  9 alcoholic triple sec      38
## 10 alcoholic grenadine       34
## # ... with 393 more rows

Gin, Vodka, Orange & Lemon Juice are most common ingredients. Using these counts can do bar plot stuff but maybe later, I thought to implement each of my last semester learnings in upcoming #tidytuesday datasets.

Association Rule Learing:

Need to transform data into transaction type matrix which can then be directly used in arules R package. Single cocktail has as many rows as its ingredients, so keeping only single rows for each cocktail with all ingredients stored as list.

library(arules)
singles <-  cocktails %>%
  mutate(ingredient = factor(ingredient)) %>% 
  group_by(row_id) %>% 
  summarize(ingredlist = list(ingredient))

#Attaching name to each set of ingredient with cocktail's row_id
names(singles$ingredlist) <- singles$row_id

Some cocktails have duplicate ingredients in data, like below:
ingredient ‘food color’ is present multiple times having different values in ‘measure’)
Hence the warning.

ingreds <- as(singles$ingredlist, "transactions")
## Warning in asMethod(object): removing duplicated items in transactions
rules <- apriori(ingreds, parameter = list(support = 0.01, confidence = 0.2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 5 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[306 item(s), 546 transaction(s)] done [0.00s].
## sorting and recoding items ... [91 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [88 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
#After multiple threshold setting, choosing 0.01 and 0.2 as it returns suitable amount of rules.
summary(rules)
## set of 88 rules
## 
## rule length distribution (lhs + rhs):sizes
##  2  3  4 
## 66 18  4 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   2.000   2.000   2.295   2.250   4.000 
## 
## summary of quality measures:
##     support          confidence        coverage            lift       
##  Min.   :0.01099   Min.   :0.2045   Min.   :0.01282   Min.   : 1.362  
##  1st Qu.:0.01282   1st Qu.:0.2848   1st Qu.:0.02152   1st Qu.: 2.418  
##  Median :0.01465   Median :0.4361   Median :0.03755   Median : 4.234  
##  Mean   :0.01642   Mean   :0.4936   Mean   :0.04225   Mean   : 7.331  
##  3rd Qu.:0.01832   3rd Qu.:0.6875   3rd Qu.:0.05128   3rd Qu.: 7.913  
##  Max.   :0.04212   Max.   :1.0000   Max.   :0.12637   Max.   :34.125  
##      count       
##  Min.   : 6.000  
##  1st Qu.: 7.000  
##  Median : 8.000  
##  Mean   : 8.966  
##  3rd Qu.:10.000  
##  Max.   :23.000  
## 
## mining info:
##     data ntransactions support confidence
##  ingreds           546    0.01        0.2
sub_rules <- inspect(head(rules, n = 10, by = "lift"))
kable(sub_rules)
lhs rhs support confidence coverage lift count
[1] {lemon juice,orange,sugar} => {maraschino cherry} 0.0146520 1.0000000 0.0146520 34.12500 8
[2] {lemon juice,orange} => {maraschino cherry} 0.0146520 0.8000000 0.0183150 27.30000 8
[3] {orange,sugar} => {maraschino cherry} 0.0201465 0.7333333 0.0274725 25.02500 11
[4] {lemon juice,maraschino cherry} => {orange} 0.0146520 1.0000000 0.0146520 23.73913 8
[5] {maraschino cherry,sugar} => {orange} 0.0201465 1.0000000 0.0201465 23.73913 11
[6] {lemon juice,maraschino cherry,sugar} => {orange} 0.0146520 1.0000000 0.0146520 23.73913 8
[7] {lemon juice,sugar} => {maraschino cherry} 0.0146520 0.6153846 0.0238095 21.00000 8
[8] {light cream} => {creme de cacao} 0.0109890 0.3750000 0.0293040 20.47500 6
[9] {creme de cacao} => {light cream} 0.0109890 0.6000000 0.0183150 20.47500 6
[10] {carbonated water,lemon} => {powdered sugar} 0.0128205 1.0000000 0.0128205 19.50000 7

Package arulesViz gives an extensive options for visual representations of association rules:

library(arulesViz)
library(htmlwidgets)
library(htmltools)
top_rules <- head(rules, n = 10, by = "lift")
p <- plot(rules, method = "graph", engine = "htmlwidget")
#Making it pretty with html tags and transparency taken care using manual css:
p <- prependContent(p, tags$h1("Relation between Ingredients of Cocktails")) 
p <- prependContent(p, tags$p("Size corresponds to Support")) 
p <- prependContent(p, tags$p("Color corresponds to Lift"))
p <- prependContent(p, tags$p("Scroll up/down to Zoom and Hover for details"))
p <- prependContent(p, tags$script('document.body.style.backgroundImage = "url(https://static.vecteezy.com/system/resources/previews/000/444/109/non_2x/vector-alcohol-cocktails-icons-black.jpg)"'))
p

Relation between Ingredients of Cocktails

Size corresponds to Support

Color corresponds to Lift

Scroll up/down to Zoom and Hover for details

Another representations of rules can be through Matrix visualisation of antecedent and consequent itemsets forming the columns and rows of matrix, respectively. Rules with the highest lift are placed in the top-right corner.

plot(rules, method = "matrix", engine = "htmlwidget")